Model Selection

Mixture of Experts Architecture

# Mixture of Experts Architecture

Bytedance BAGEL 7B MoT INT8

BAGEL is an open-source 7B active parameter multimodal foundation model supporting multimodal understanding and generation tasks

BAGEL is an open-source, 7-billion-parameter active multimodal foundation model trained on large-scale interleaved multimodal data, excelling in both understanding and generation tasks.

Qwen3 1.7B GGUF

Qwen3 is the latest version of the Tongyi Qianwen series of large language models, offering a range of dense and mixture of experts (MoE) models. Based on large-scale training, Qwen3 has achieved breakthrough progress in reasoning, instruction following, agent capabilities, and multilingual support.

Large Language Model English

Ling is a large-scale Mixture of Experts (MoE) language model open-sourced by InclusionAI. The Lite version features 16.8 billion total parameters with 2.75 billion activated parameters, demonstrating exceptional performance.

Large Language Model

Qwen3 30B A7.5B 24 Grand Brainstorm

A fine-tuned version based on the Qwen3-30B-A3B Mixture of Experts model, increasing the number of active experts from 8 to 24, suitable for complex tasks requiring deep reasoning

Large Language Model

Qwen3 30B A6B 16 Extreme 128k Context

A fine-tuned version of the Qwen3-30B-A3B mixture of experts model, with activated experts increased to 16 and context window expanded to 128k, suitable for complex reasoning scenarios

Large Language Model

Qwen3 30B A1.5B High Speed

An optimized high-speed version of Qwen3-30B, achieving doubled inference speed by reducing activated experts, suitable for text generation scenarios requiring rapid responses

Large Language Model

Qwen3 235B A22B AWQ

Qwen3-235B-A22B is the latest generation large language model in the Qwen series, adopting a Mixture of Experts (MoE) architecture with 235 billion parameters and 22 billion active parameters. It excels in reasoning, instruction following, agent capabilities, and multilingual support.

Large Language Model

cognitivecomputations

Nomic Embed Text V2 Moe GGUF

A multilingual mixture of experts text embedding model that supports approximately 100 languages and performs excellently in multilingual retrieval.

Text Embedding Supports Multiple Languages

Nomic Embed Text V2 GGUF

Nomic Embed Text V2 GGUF is a multilingual text embedding model supporting over 70 languages, suitable for sentence similarity calculation and feature extraction tasks.

Text Embedding Supports Multiple Languages

Qwen3 235B A22B GGUF

Qwen3-235B-A22B is a 235-billion-parameter large language model that has undergone advanced non-linear quantization processing via the ik_llama.cpp branch, suitable for high-performance computing environments.

Large Language Model

Qwen3 235B A22B

Qwen3 is the latest version of the Tongyi Qianwen series of large language models, offering a complete suite of dense models and Mixture of Experts (MoE) models, achieving breakthrough progress in reasoning, instruction following, agent capabilities, and multilingual support.

Large Language Model

MAI-DS-R1 is the DeepSeek-R1 inference model, further trained by Microsoft's AI team to enhance its responsiveness on restricted topics and optimize its risk performance while maintaining its reasoning capabilities and competitive performance.

Large Language Model

Llama3.1 MOE 4X8B Gated IQ Multi Tier COGITO Deep Reasoning 32B GGUF

A Mixture of Experts (MoE) model with adjustable reasoning capabilities, enhancing inference and text generation through collaboration of four 8B models

Large Language Model Supports Multiple Languages

MAI-DS-R1 is the result of Microsoft AI team's post-training of the DeepSeek-R1 inference model, aimed at enhancing its response capability to sensitive topics and optimizing risk performance, while maintaining the original reasoning ability and competitive advantages.

Large Language Model

Llama 4 Scout 17B 16E Linearized Bnb Nf4 Bf16

Llama 4 Scout is a 17-billion-parameter Mixture of Experts (MoE) model released by Meta, supporting multilingual text and image understanding with a linearized expert module design for PEFT/LoRA compatibility.

Multimodal Fusion

Transformers Supports Multiple Languages

Llama 4 Scout 17B 16E Unsloth

Llama 4 Scout is a 17-billion-parameter multimodal AI model developed by Meta, featuring a Mixture of Experts architecture with support for 12 languages and image understanding.

Transformers Supports Multiple Languages

Doge 120M MoE Instruct

The Doge model employs dynamic masked attention mechanisms for sequence transformation and can use multi-layer perceptrons or cross-domain mixture of experts for state transitions.

Large Language Model

Transformers English

Llama 4 Maverick 17B 128E

Llama 4 Maverick is a multimodal AI model developed by Meta, utilizing a Mixture of Experts architecture, supporting text and image understanding, with 17 billion active parameters and 400 billion total parameters.

Transformers Supports Multiple Languages

Llama 4 Maverick 17B 128E Instruct

Llama 4 Maverick is a 17-billion-parameter multimodal AI model developed by Meta, featuring a Mixture of Experts (MoE) architecture, supporting multilingual text and image understanding with 128 expert modules.

Large Language Model

Transformers Supports Multiple Languages

Deepseek V3 0324 GGUF

DeepSeek-V3-0324 is the March update version released by the DeepSeek team, showing significant improvements over the previous generation in multiple benchmarks, supporting dynamic quantization versions, suitable for local inference.

Large Language Model English

Llm Jp 3 8x13b Instruct3

A large-scale Japanese-English hybrid MoE language model developed by Japan's National Institute of Informatics, supporting an 8x13B parameter scale with instruction fine-tuning optimization

Large Language Model

Transformers Supports Multiple Languages

Qwen2.5 MOE 2X1.5B DeepSeek Uncensored Censored 4B Gguf

This is a Qwen2.5 MOE (Mixture of Experts) model, composed of two Qwen 2.5 DeepSeek (censored/regular and uncensored) 1.5B models, forming a 4B model where the uncensored version of DeepSeek Qwen 2.5 1.5B dominates the model's behavior.

Large Language Model Supports Multiple Languages

Hiber Multi 10B Instruct

Hiber-Multi-10B-Instruct is an advanced multilingual large language model based on Transformer architecture, supporting multiple languages with 10 billion parameters, suitable for text generation tasks.

Large Language Model

Transformers Supports Multiple Languages

Nomic Embed Text V2 Moe Unsupervised

This is an intermediate version of a multilingual Mixture of Experts (MoE) text embedding model, obtained through multi-stage contrastive training

Nomic Embed Text V2 Moe

Nomic Embed v2 is a high-performance multilingual Mixture of Experts (MoE) text embedding model supporting approximately 100 languages, excelling in multilingual retrieval tasks.

Text Embedding Supports Multiple Languages

DeepSeek-R1 is the first-generation inference model launched by DeepSeek. Through large-scale reinforcement learning training, it performs excellently in mathematics, code, and reasoning tasks.

Large Language Model

Falcon3 MoE 2x7B Insruct

Falcon3 7B-IT and 7B-IT Mixture of Experts model with 13.4 billion parameters, supporting English, French, Spanish, and Portuguese, with a context length of up to 32K.

Large Language Model

Safetensors English

Llama 3.2 4X3B MOE Ultra Instruct 10B GGUF

A Mixture of Experts model based on Llama 3.2, integrating four 3B models to form a 10B parameter model, supporting 128k context length, excelling in instruction following and full-scenario generation.

Large Language Model English

TimeMoE-200M is a billion-scale time series foundation model based on the Mixture of Experts (MoE) architecture, focusing on time series forecasting tasks.

ChartMoE is a multimodal large language model based on InternLM-XComposer2, featuring a mixture of experts connector with advanced chart capabilities.

Deepseek V2 Lite

DeepSeek-V2-Lite is a cost-efficient Mixture of Experts (MoE) language model with a total of 16B parameters and 2.4B active parameters, supporting a 32k context length.

Large Language Model

Mixtral 8x22B V0.1 GGUF

Mixtral 8x22B is a 176-billion-parameter mixture of experts model released by MistralAI, supporting multilingual text generation tasks.

Large Language Model Supports Multiple Languages

A Mixture of Experts (MoE) large language model developed by Databricks, specialized in few-turn interaction scenarios

Large Language Model

A Mixture of Experts (MoE) large language model developed by Databricks, with 132 billion total parameters and 36 billion active parameters, supporting a 32K context window

Large Language Model

xLAM-v0.1 is a major upgrade in the Large Action Model series, fine-tuned across a wide range of agent tasks and scenarios while maintaining the original model's capabilities with the same parameter count.

Large Language Model

Openbuddy Mixtral 7bx8 V18.1 32k GGUF

OpenBuddy is an open multilingual chatbot model based on the Mixtral-8x7B architecture, suitable for multilingual dialogue scenarios.

Large Language Model Supports Multiple Languages

Moe LLaVA Qwen 1.8B 4e

MoE-LLaVA is a large vision-language model based on the Mixture of Experts architecture, achieving efficient multimodal learning through sparse activation parameters

Discolm Mixtral 8x7b V2

Experimental 8x7b Mixture of Experts model developed based on Mistral AI's Mixtral 8x7b, fine-tuned on Synthia, MetaMathQA, and Capybara datasets

Large Language Model

Transformers English

Mixtral 7b 8expert

The latest Mixture of Experts (MoE) model released by MistralAI, supporting multilingual text generation tasks

Large Language Model

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase